Skip to content

Introduce intra-procedural lifetime analysis in Clang #142313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

usx95
Copy link
Contributor

@usx95 usx95 commented Jun 1, 2025

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts and final dataflow state.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics. This will likely require liveness analysis to identify live origins holding expired loans.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

Performance on pathological test cases:

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &v1;
  MyObj* p2 = &v2;
  MyObj* p3 = &v3;
  MyObj* p4 = &v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

@usx95 usx95 force-pushed the dangling-references-latest branch from c8f6277 to 3624359 Compare June 2, 2025 13:06
Copy link

github-actions bot commented Jun 2, 2025

✅ With the latest revision this PR passed the Python code formatter.

@usx95 usx95 force-pushed the dangling-references-latest branch 4 times, most recently from 668e329 to 7b929ee Compare June 2, 2025 23:41
@usx95 usx95 changed the title Introduce Intra-procedural lifetime analysis in Clang Introduce intra-procedural lifetime analysis in Clang Jun 2, 2025
@usx95 usx95 force-pushed the dangling-references-latest branch 2 times, most recently from 136238f to 942648e Compare June 3, 2025 13:52
@usx95 usx95 requested review from ymand, gribozavr and jvoung June 4, 2025 14:32
@usx95 usx95 force-pushed the dangling-references-latest branch from 942648e to cb7bd7f Compare June 5, 2025 13:52
@usx95 usx95 force-pushed the dangling-references-latest branch from cb7bd7f to 0cd187b Compare June 5, 2025 13:57
@usx95 usx95 marked this pull request as ready for review June 10, 2025 05:49
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:analysis labels Jun 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang-analysis

Author: Utkarsh Saxena (usx95)

Changes

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts and final dataflow state.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

<details>
<summary>

Performance on pathological test cases:

</summary>

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &amp;v1;
  MyObj* p2 = &amp;v2;
  MyObj* p3 = &amp;v3;
  MyObj* p4 = &amp;v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

</details>


Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff

5 Files Affected:

  • (added) clang/include/clang/Analysis/Analyses/LifetimeSafety.h (+13)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/LifetimeSafety.cpp (+728)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+8)
  • (added) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+362)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+                         AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
   FixitUtil.cpp
   IntervalPartition.cpp
   IssueHash.cpp
+  LifetimeSafety.cpp
   LiveVariables.cpp
   MacroExpansionContext.cpp
   ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+  const clang::CFGBlock *Block;
+  /// Index into Block->Elements().
+  unsigned ElementIndex;
+
+  Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+      : Block(B), ElementIndex(Idx) {}
+
+  bool operator==(const Point &Other) const {
+    return Block == Other.Block && ElementIndex == Other.ElementIndex;
+  }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+  const clang::ValueDecl *D;
+
+  enum class Kind : uint8_t {
+    StackVariable,
+    Heap,            // TODO: Handle.
+    Field,           // TODO: Handle.
+    ArrayElement,    // TODO: Handle.
+    TemporaryObject, // TODO: Handle.
+    StaticOrGlobal,  // TODO: Handle.
+  };
+
+  Kind PathKind;
+
+  Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+  /// TODO: Represent opaque loans.
+  /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+  /// is represented as empty LoanSet
+  LoanID ID;
+  Path SourcePath;
+  SourceLocation IssueLoc;
+
+  LoanInfo(LoanID id, Path path, SourceLocation loc)
+      : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+  OriginID ID;
+  OriginKind Kind;
+  union {
+    const clang::ValueDecl *Decl;
+    const clang::Expr *Expression;
+  };
+  OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+      : ID(id), Kind(kind), Decl(D) {}
+  OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+      : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+  LoanManager() = default;
+
+  LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+    NextLoanIDVal++;
+    AllLoans.emplace_back(NextLoanIDVal, path, loc);
+    return AllLoans.back();
+  }
+
+  const LoanInfo *getLoanInfo(LoanID id) const {
+    if (id < AllLoans.size())
+      return &AllLoans[id];
+    return nullptr;
+  }
+  llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+  LoanID NextLoanIDVal = 0;
+  llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+  OriginManager() = default;
+
+  OriginID getNextOriginID() { return NextOriginIDVal++; }
+  OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+    assert(D != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::Variable, D);
+    return AllOrigins.back();
+  }
+  OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+    assert(E != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+    return AllOrigins.back();
+  }
+
+  OriginID getOrCreate(const Expr *E) {
+    auto It = ExprToOriginID.find(E);
+    if (It != ExprToOriginID.end())
+      return It->second;
+
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+      // Origin of DeclRefExpr is that of the declaration it refers to.
+      return getOrCreate(DRE->getDecl());
+    }
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, E);
+    ExprToOriginID[E] = NewID;
+    return NewID;
+  }
+
+  const OriginInfo *getOriginInfo(OriginID id) const {
+    if (id < AllOrigins.size())
+      return &AllOrigins[id];
+    return nullptr;
+  }
+
+  llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+  OriginID getOrCreate(const ValueDecl *D) {
+    auto It = DeclToOriginID.find(D);
+    if (It != DeclToOriginID.end())
+      return It->second;
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, D);
+    DeclToOriginID[D] = NewID;
+    return NewID;
+  }
+
+private:
+  OriginID NextOriginIDVal = 0;
+  llvm::SmallVector<OriginInfo> AllOrigins;
+  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+  enum class Kind : uint8_t {
+    /// A new loan is issued from a borrow expression (e.g., &x).
+    Issue,
+    /// A loan expires as its underlying storage is freed (e.g., variable goes
+    /// out of scope).
+    Expire,
+    /// An origin is propagated from a source to a destination (e.g., p = q).
+    AssignOrigin,
+    /// An origin is part of a function's return value.
+    ReturnOfOrigin
+  };
+
+private:
+  Kind K;
+
+protected:
+  Point P;
+  Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+  virtual ~Fact() = default;
+  Kind getKind() const { return K; }
+  Point getPoint() const { return P; }
+
+  template <typename T> const T *getAs() const {
+    if (T::classof(this))
+      return static_cast<const T *>(this);
+    return nullptr;
+  }
+
+  virtual void dump(llvm::raw_ostream &OS) const {
+    OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+       << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+  }
+};
+
+class IssueFact : public Fact {
+  LoanID LID;
+  OriginID OID;
+
+public:
+  IssueFact(LoanID LID, OriginID OID, Point Pt)
+      : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+  LoanID getLoanID() const { return LID; }
+  OriginID getOriginID() const { return OID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+       << ")\n";
+  }
+};
+
+class ExpireFact : public Fact {
+  LoanID LID;
+
+public:
+  ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+  LoanID getLoanID() const { return LID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Expire (LoanID: " << getLoanID() << ")\n";
+  }
+};
+
+class AssignOriginFact : public Fact {
+  OriginID OIDDest;
+  OriginID OIDSrc;
+
+public:
+  AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+      : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+  OriginID getDestOriginID() const { return OIDDest; }
+  OriginID getSrcOriginID() const { return OIDSrc; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::AssignOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "AssignOrigin (DestID: " << getDestOriginID()
+       << ", SrcID: " << getSrcOriginID() << ")\n";
+  }
+};
+
+class ReturnOfOriginFact : public Fact {
+  OriginID OID;
+
+public:
+  ReturnOfOriginFact(OriginID OID, Point Pt)
+      : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+  OriginID getReturnedOriginID() const { return OID; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::ReturnOfOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+  }
+};
+
+class FactManager {
+public:
+  llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+    auto It = BlockToFactsMap.find(B);
+    if (It != BlockToFactsMap.end())
+      return It->second;
+    return {};
+  }
+
+  void addFact(const CFGBlock *B, Fact *NewFact) {
+    BlockToFactsMap[B].push_back(NewFact);
+  }
+
+  template <typename FactType, typename... Args>
+  FactType *createFact(Args &&...args) {
+    void *Mem = FactAllocator.Allocate<FactType>();
+    return new (Mem) FactType(std::forward<Args>(args)...);
+  }
+
+  void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+    llvm::dbgs() << "==========================================\n";
+    llvm::dbgs() << "       Lifetime Analysis Facts:\n";
+    llvm::dbgs() << "==========================================\n";
+    if (const Decl *D = AC.getDecl()) {
+      if (const auto *ND = dyn_cast<NamedDecl>(D))
+        llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+    }
+    // Print blocks in the order as they appear in code for a stable ordering.
+    ForwardDataflowWorklist worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      worklist.enqueueBlock(B);
+    while (const CFGBlock *B = worklist.dequeue()) {
+      llvm::dbgs() << "  Block B" << B->getBlockID() << ":\n";
+      auto It = BlockToFactsMap.find(B);
+      if (It != BlockToFactsMap.end()) {
+        for (const Fact *F : It->second) {
+          llvm::dbgs() << "    ";
+          F->dump(llvm::dbgs());
+        }
+      }
+      llvm::dbgs() << "  End of Block\n";
+    }
+  }
+
+private:
+  llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+      BlockToFactsMap;
+  llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+  FactManager Facts;
+  LoanManager Loans;
+  OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+  FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+      : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+  void run() {
+    llvm::TimeTraceScope TimeProfile("Fact Generation");
+    // Iterate through the CFG blocks in pre-order to ensure that
+    // initializations and destructions are processed in the correct sequence.
+    // TODO: A pre-order traversal utility should be provided by Dataflow
+    // framework.
+    ForwardDataflowWorklist Worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      Worklist.enqueueBlock(B);
+    while (const CFGBlock *Block = Worklist.dequeue()) {
+      CurrentBlock = Block;
+      for (unsigned I = 0; I < Block->size(); ++I) {
+        const CFGElement &Element = Block->Elements[I];
+        CurrentPoint = Point(Block, I);
+        if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+          Visit(CS->getStmt());
+        else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+                     Element.getAs<CFGAutomaticObjDtor>())
+          handleDestructor(*DtorOpt);
+      }
+    }
+  }
+
+  void VisitDeclStmt(const DeclStmt *DS) {
+    for (const Decl *D : DS->decls()) {
+      if (const auto *VD = dyn_cast<VarDecl>(D)) {
+        if (hasOrigin(VD->getType())) {
+          if (const Expr *InitExpr = VD->getInit()) {
+            OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+            OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<AssignOriginFact>(
+                                       DestOID, SrcOID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+    if (!hasOrigin(ICE->getType()))
+      return;
+    // An ImplicitCastExpr node itself gets an origin, which flows from the
+    // origin of its sub-expression (after stripping its own parens/casts).
+    if (ICE->getCastKind() == CK_LValueToRValue) {
+      OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+      OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+      FactsCtx.Facts.addFact(CurrentBlock,
+                             FactsCtx.Facts.createFact<AssignOriginFact>(
+                                 IceOID, SubExprOID, CurrentPoint));
+    }
+  }
+
+  void VisitUnaryOperator(const UnaryOperator *UO) {
+    if (UO->getOpcode() == UO_AddrOf) {
+      const Expr *SubExpr = UO->getSubExpr();
+      if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+        if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+          // Check if it's a local variable.
+          if (VD->hasLocalStorage()) {
+            OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+            Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+            LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+                                                        UO->getOperatorLoc());
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<IssueFact>(
+                                       Loan.ID, OID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitReturnStmt(const ReturnStmt *RS) {
+    if (const Expr *RetExpr = RS->getRetValue()) {
+      if (hasOrigin(RetExpr->getType())) {
+        OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+        FactsCtx.Facts.addFact(
+            CurrentBlock,
+            FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+      }
+    }
+  }
+
+  void VisitBinaryOperator(const BinaryOperator *BO) {
+    if (BO->isAssignmentOp()) {
+      const Expr *LHSExpr = BO->getLHS();
+      const Expr *RHSExpr = BO->getRHS();
+
+      // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+      // LHS must be a pointer/reference type that can be an origin.
+      // RHS must also represent an origin (either another pointer/ref or an
+      // address-of).
+      if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+        if (const auto *VD_LHS =
+                dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+            VD_LHS && hasOrigin(VD_LHS->getType())) {
+          OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+          OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+          FactsCtx.Facts.addFact(CurrentBlock,
+                                 FactsCtx.Facts.createFact<AssignOriginFact>(
+                                     DestOID, SrcOID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+private:
+  // Check if a type have an origin.
+  bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+  void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+    /// TODO: Also handle trivial destructors (e.g., for `int`
+    // variables) which will never have a CFGAutomaticObjDtor node.
+    /// TODO: Handle loans to temporaries.
+    const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+    if (!DestructedVD)
+      return;
+    // Iterate through all loans to see if any expire.
+    for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+      const Path &LoanPath = Loan.SourcePath;
+      // Check if the loan is for a stack variable and if that variable
+      // is the one being destructed.
+      if (LoanPath.PathKind == Path::Kind::StackVariable) {
+        if (LoanPath.D == DestructedVD) {
+          FactsCtx.Facts.addFact(
+              CurrentBlock,
+              FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+  FactsContext &FactsCtx;
+  const CFG &Cfg;
+  AnalysisDeclContext &AC;
+  const CFGBlock *CurrentBlock;
+  Point CurrentPoint;
+};
+
+// ========================================================================= //
+//                              The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+  OriginLoanMap::Factory OriginMapFact;
+  LoanSet::Factory LoanSetFact;
+
+  LoanSet createLoanSet(LoanID LID) {
+    return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+  }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+  /// The map from an origin to the set of loans it contains.
+  /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+  /// not expressions, because expressions are not visible across blocks.
+  OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+  explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+  LifetimeLattice() = default;
+
+  bool operator==(const LifetimeLattice &Other) const {
+    return Origins == Other.Origins;
+  }
+  bool operator!=(const LifetimeLattice &Other) const {
+    return !(*this == Other);
+  }
+
+  LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+    if (auto *Loans = Origins.lookup(OID))
+      return *Loans;
+    return Factory.LoanSetFact.getEmptySet();
+  }
+
+  /// Computes the union of two lattices by performing a key-wise merge of
+  /// their OriginLoanMaps.
+  LifetimeLattice merge(const LifetimeLattice &Other,
+                        LifetimeFactory &Factory) const {
+    /// Merge the smaller map into the larger one ensuring we iterate over the
+    /// smaller map.
+    if (Origins.getHeight() < Other.Origins.getHeight())
+      return Other.merge(*this, Factory);
+
+    OriginLoanMap MergedState = Origins;
+    // For each origin in the other map, union its loan set with ours.
+    for (const auto &Entry : Other.Origins) {
+      OriginID OID = Entry.first;
+      LoanSet OtherLoanSet = Entry.second;
+      MergedState = Factory.OriginMapFact.add(
+          MergedState, OID,
+          merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+    }
+    return LifetimeLattice(MergedState);
+  }
+
+  LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+    /// Merge the smaller set into the larger one ensuring we iterate over the
+    /// smaller set.
+    if (a.getHeight() < b.getHeight())
+      std::swap(a, b);
+    LoanSet Result = a;
+    for (LoanID LID : b) {
+      /// TODO(opt): Profiling shows that this loop is a major performance
+      /// bottleneck. Investigate using a BitVector to represent the set of
+      /// loans for improved merge performance.
+      Result = Factory.LoanSetFact.add(Result, LID);
+    }
+    return Result;
+  }
+
+  void dump(llvm::raw_ostream &OS) const {
+    OS << "LifetimeLattice State:\n";
+    if (Origins.isEmpty())
+      OS << "  <empty>\n";
+    for (const auto &Entry : Origi...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang

Author: Utkarsh Saxena (usx95)

Changes

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts and final dataflow state.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

<details>
<summary>

Performance on pathological test cases:

</summary>

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &amp;v1;
  MyObj* p2 = &amp;v2;
  MyObj* p3 = &amp;v3;
  MyObj* p4 = &amp;v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

</details>


Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff

5 Files Affected:

  • (added) clang/include/clang/Analysis/Analyses/LifetimeSafety.h (+13)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/LifetimeSafety.cpp (+728)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+8)
  • (added) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+362)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+                         AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
   FixitUtil.cpp
   IntervalPartition.cpp
   IssueHash.cpp
+  LifetimeSafety.cpp
   LiveVariables.cpp
   MacroExpansionContext.cpp
   ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+  const clang::CFGBlock *Block;
+  /// Index into Block->Elements().
+  unsigned ElementIndex;
+
+  Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+      : Block(B), ElementIndex(Idx) {}
+
+  bool operator==(const Point &Other) const {
+    return Block == Other.Block && ElementIndex == Other.ElementIndex;
+  }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+  const clang::ValueDecl *D;
+
+  enum class Kind : uint8_t {
+    StackVariable,
+    Heap,            // TODO: Handle.
+    Field,           // TODO: Handle.
+    ArrayElement,    // TODO: Handle.
+    TemporaryObject, // TODO: Handle.
+    StaticOrGlobal,  // TODO: Handle.
+  };
+
+  Kind PathKind;
+
+  Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+  /// TODO: Represent opaque loans.
+  /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+  /// is represented as empty LoanSet
+  LoanID ID;
+  Path SourcePath;
+  SourceLocation IssueLoc;
+
+  LoanInfo(LoanID id, Path path, SourceLocation loc)
+      : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+  OriginID ID;
+  OriginKind Kind;
+  union {
+    const clang::ValueDecl *Decl;
+    const clang::Expr *Expression;
+  };
+  OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+      : ID(id), Kind(kind), Decl(D) {}
+  OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+      : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+  LoanManager() = default;
+
+  LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+    NextLoanIDVal++;
+    AllLoans.emplace_back(NextLoanIDVal, path, loc);
+    return AllLoans.back();
+  }
+
+  const LoanInfo *getLoanInfo(LoanID id) const {
+    if (id < AllLoans.size())
+      return &AllLoans[id];
+    return nullptr;
+  }
+  llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+  LoanID NextLoanIDVal = 0;
+  llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+  OriginManager() = default;
+
+  OriginID getNextOriginID() { return NextOriginIDVal++; }
+  OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+    assert(D != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::Variable, D);
+    return AllOrigins.back();
+  }
+  OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+    assert(E != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+    return AllOrigins.back();
+  }
+
+  OriginID getOrCreate(const Expr *E) {
+    auto It = ExprToOriginID.find(E);
+    if (It != ExprToOriginID.end())
+      return It->second;
+
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+      // Origin of DeclRefExpr is that of the declaration it refers to.
+      return getOrCreate(DRE->getDecl());
+    }
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, E);
+    ExprToOriginID[E] = NewID;
+    return NewID;
+  }
+
+  const OriginInfo *getOriginInfo(OriginID id) const {
+    if (id < AllOrigins.size())
+      return &AllOrigins[id];
+    return nullptr;
+  }
+
+  llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+  OriginID getOrCreate(const ValueDecl *D) {
+    auto It = DeclToOriginID.find(D);
+    if (It != DeclToOriginID.end())
+      return It->second;
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, D);
+    DeclToOriginID[D] = NewID;
+    return NewID;
+  }
+
+private:
+  OriginID NextOriginIDVal = 0;
+  llvm::SmallVector<OriginInfo> AllOrigins;
+  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+  enum class Kind : uint8_t {
+    /// A new loan is issued from a borrow expression (e.g., &x).
+    Issue,
+    /// A loan expires as its underlying storage is freed (e.g., variable goes
+    /// out of scope).
+    Expire,
+    /// An origin is propagated from a source to a destination (e.g., p = q).
+    AssignOrigin,
+    /// An origin is part of a function's return value.
+    ReturnOfOrigin
+  };
+
+private:
+  Kind K;
+
+protected:
+  Point P;
+  Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+  virtual ~Fact() = default;
+  Kind getKind() const { return K; }
+  Point getPoint() const { return P; }
+
+  template <typename T> const T *getAs() const {
+    if (T::classof(this))
+      return static_cast<const T *>(this);
+    return nullptr;
+  }
+
+  virtual void dump(llvm::raw_ostream &OS) const {
+    OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+       << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+  }
+};
+
+class IssueFact : public Fact {
+  LoanID LID;
+  OriginID OID;
+
+public:
+  IssueFact(LoanID LID, OriginID OID, Point Pt)
+      : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+  LoanID getLoanID() const { return LID; }
+  OriginID getOriginID() const { return OID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+       << ")\n";
+  }
+};
+
+class ExpireFact : public Fact {
+  LoanID LID;
+
+public:
+  ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+  LoanID getLoanID() const { return LID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Expire (LoanID: " << getLoanID() << ")\n";
+  }
+};
+
+class AssignOriginFact : public Fact {
+  OriginID OIDDest;
+  OriginID OIDSrc;
+
+public:
+  AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+      : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+  OriginID getDestOriginID() const { return OIDDest; }
+  OriginID getSrcOriginID() const { return OIDSrc; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::AssignOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "AssignOrigin (DestID: " << getDestOriginID()
+       << ", SrcID: " << getSrcOriginID() << ")\n";
+  }
+};
+
+class ReturnOfOriginFact : public Fact {
+  OriginID OID;
+
+public:
+  ReturnOfOriginFact(OriginID OID, Point Pt)
+      : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+  OriginID getReturnedOriginID() const { return OID; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::ReturnOfOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+  }
+};
+
+class FactManager {
+public:
+  llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+    auto It = BlockToFactsMap.find(B);
+    if (It != BlockToFactsMap.end())
+      return It->second;
+    return {};
+  }
+
+  void addFact(const CFGBlock *B, Fact *NewFact) {
+    BlockToFactsMap[B].push_back(NewFact);
+  }
+
+  template <typename FactType, typename... Args>
+  FactType *createFact(Args &&...args) {
+    void *Mem = FactAllocator.Allocate<FactType>();
+    return new (Mem) FactType(std::forward<Args>(args)...);
+  }
+
+  void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+    llvm::dbgs() << "==========================================\n";
+    llvm::dbgs() << "       Lifetime Analysis Facts:\n";
+    llvm::dbgs() << "==========================================\n";
+    if (const Decl *D = AC.getDecl()) {
+      if (const auto *ND = dyn_cast<NamedDecl>(D))
+        llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+    }
+    // Print blocks in the order as they appear in code for a stable ordering.
+    ForwardDataflowWorklist worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      worklist.enqueueBlock(B);
+    while (const CFGBlock *B = worklist.dequeue()) {
+      llvm::dbgs() << "  Block B" << B->getBlockID() << ":\n";
+      auto It = BlockToFactsMap.find(B);
+      if (It != BlockToFactsMap.end()) {
+        for (const Fact *F : It->second) {
+          llvm::dbgs() << "    ";
+          F->dump(llvm::dbgs());
+        }
+      }
+      llvm::dbgs() << "  End of Block\n";
+    }
+  }
+
+private:
+  llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+      BlockToFactsMap;
+  llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+  FactManager Facts;
+  LoanManager Loans;
+  OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+  FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+      : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+  void run() {
+    llvm::TimeTraceScope TimeProfile("Fact Generation");
+    // Iterate through the CFG blocks in pre-order to ensure that
+    // initializations and destructions are processed in the correct sequence.
+    // TODO: A pre-order traversal utility should be provided by Dataflow
+    // framework.
+    ForwardDataflowWorklist Worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      Worklist.enqueueBlock(B);
+    while (const CFGBlock *Block = Worklist.dequeue()) {
+      CurrentBlock = Block;
+      for (unsigned I = 0; I < Block->size(); ++I) {
+        const CFGElement &Element = Block->Elements[I];
+        CurrentPoint = Point(Block, I);
+        if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+          Visit(CS->getStmt());
+        else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+                     Element.getAs<CFGAutomaticObjDtor>())
+          handleDestructor(*DtorOpt);
+      }
+    }
+  }
+
+  void VisitDeclStmt(const DeclStmt *DS) {
+    for (const Decl *D : DS->decls()) {
+      if (const auto *VD = dyn_cast<VarDecl>(D)) {
+        if (hasOrigin(VD->getType())) {
+          if (const Expr *InitExpr = VD->getInit()) {
+            OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+            OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<AssignOriginFact>(
+                                       DestOID, SrcOID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+    if (!hasOrigin(ICE->getType()))
+      return;
+    // An ImplicitCastExpr node itself gets an origin, which flows from the
+    // origin of its sub-expression (after stripping its own parens/casts).
+    if (ICE->getCastKind() == CK_LValueToRValue) {
+      OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+      OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+      FactsCtx.Facts.addFact(CurrentBlock,
+                             FactsCtx.Facts.createFact<AssignOriginFact>(
+                                 IceOID, SubExprOID, CurrentPoint));
+    }
+  }
+
+  void VisitUnaryOperator(const UnaryOperator *UO) {
+    if (UO->getOpcode() == UO_AddrOf) {
+      const Expr *SubExpr = UO->getSubExpr();
+      if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+        if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+          // Check if it's a local variable.
+          if (VD->hasLocalStorage()) {
+            OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+            Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+            LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+                                                        UO->getOperatorLoc());
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<IssueFact>(
+                                       Loan.ID, OID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitReturnStmt(const ReturnStmt *RS) {
+    if (const Expr *RetExpr = RS->getRetValue()) {
+      if (hasOrigin(RetExpr->getType())) {
+        OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+        FactsCtx.Facts.addFact(
+            CurrentBlock,
+            FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+      }
+    }
+  }
+
+  void VisitBinaryOperator(const BinaryOperator *BO) {
+    if (BO->isAssignmentOp()) {
+      const Expr *LHSExpr = BO->getLHS();
+      const Expr *RHSExpr = BO->getRHS();
+
+      // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+      // LHS must be a pointer/reference type that can be an origin.
+      // RHS must also represent an origin (either another pointer/ref or an
+      // address-of).
+      if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+        if (const auto *VD_LHS =
+                dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+            VD_LHS && hasOrigin(VD_LHS->getType())) {
+          OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+          OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+          FactsCtx.Facts.addFact(CurrentBlock,
+                                 FactsCtx.Facts.createFact<AssignOriginFact>(
+                                     DestOID, SrcOID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+private:
+  // Check if a type have an origin.
+  bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+  void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+    /// TODO: Also handle trivial destructors (e.g., for `int`
+    // variables) which will never have a CFGAutomaticObjDtor node.
+    /// TODO: Handle loans to temporaries.
+    const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+    if (!DestructedVD)
+      return;
+    // Iterate through all loans to see if any expire.
+    for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+      const Path &LoanPath = Loan.SourcePath;
+      // Check if the loan is for a stack variable and if that variable
+      // is the one being destructed.
+      if (LoanPath.PathKind == Path::Kind::StackVariable) {
+        if (LoanPath.D == DestructedVD) {
+          FactsCtx.Facts.addFact(
+              CurrentBlock,
+              FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+  FactsContext &FactsCtx;
+  const CFG &Cfg;
+  AnalysisDeclContext &AC;
+  const CFGBlock *CurrentBlock;
+  Point CurrentPoint;
+};
+
+// ========================================================================= //
+//                              The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+  OriginLoanMap::Factory OriginMapFact;
+  LoanSet::Factory LoanSetFact;
+
+  LoanSet createLoanSet(LoanID LID) {
+    return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+  }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+  /// The map from an origin to the set of loans it contains.
+  /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+  /// not expressions, because expressions are not visible across blocks.
+  OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+  explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+  LifetimeLattice() = default;
+
+  bool operator==(const LifetimeLattice &Other) const {
+    return Origins == Other.Origins;
+  }
+  bool operator!=(const LifetimeLattice &Other) const {
+    return !(*this == Other);
+  }
+
+  LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+    if (auto *Loans = Origins.lookup(OID))
+      return *Loans;
+    return Factory.LoanSetFact.getEmptySet();
+  }
+
+  /// Computes the union of two lattices by performing a key-wise merge of
+  /// their OriginLoanMaps.
+  LifetimeLattice merge(const LifetimeLattice &Other,
+                        LifetimeFactory &Factory) const {
+    /// Merge the smaller map into the larger one ensuring we iterate over the
+    /// smaller map.
+    if (Origins.getHeight() < Other.Origins.getHeight())
+      return Other.merge(*this, Factory);
+
+    OriginLoanMap MergedState = Origins;
+    // For each origin in the other map, union its loan set with ours.
+    for (const auto &Entry : Other.Origins) {
+      OriginID OID = Entry.first;
+      LoanSet OtherLoanSet = Entry.second;
+      MergedState = Factory.OriginMapFact.add(
+          MergedState, OID,
+          merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+    }
+    return LifetimeLattice(MergedState);
+  }
+
+  LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+    /// Merge the smaller set into the larger one ensuring we iterate over the
+    /// smaller set.
+    if (a.getHeight() < b.getHeight())
+      std::swap(a, b);
+    LoanSet Result = a;
+    for (LoanID LID : b) {
+      /// TODO(opt): Profiling shows that this loop is a major performance
+      /// bottleneck. Investigate using a BitVector to represent the set of
+      /// loans for improved merge performance.
+      Result = Factory.LoanSetFact.add(Result, LID);
+    }
+    return Result;
+  }
+
+  void dump(llvm::raw_ostream &OS) const {
+    OS << "LifetimeLattice State:\n";
+    if (Origins.isEmpty())
+      OS << "  <empty>\n";
+    for (const auto &Entry : Origi...
[truncated]

@usx95 usx95 requested a review from Xazax-hun June 10, 2025 05:49
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a very rudimentary first pass, will take a second look a bit later.

@@ -2842,6 +2844,12 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
}
}

DEBUG_WITH_TYPE(
"ExperimentalLifetimeAnalysis", if (S.getLangOpts().CPlusPlus) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is restricting this to C++ a temporary or is it not planned to support other languages like C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary to initially scope it to C++. Added a TODO to enable it later.

namespace clang {
namespace {

struct Point {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add a comment to specify how to interpret this point. Is this the program point before or after the element?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if Point is self-descriptive enough or whether this should be ProgramPoint. Does not have a strong feeling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed Point at the moment. I think this can be added later when this actually looks useful. At the moment it is primarily mapping Facts to CFGStmt which is not used.

struct Path {
const clang::ValueDecl *D;

enum class Kind : uint8_t {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we do not know where does the storage location live. Do we plan to have Unknown kinds?

Also, sometimes we might learn more about a location during analysis. E.g., in a branch if (p == q), the two pointers must have the same Kind. Or if we see delete p; we learned that p must be a heap location.

But I am wondering what is this information used for? If we use it to derive lifetime facts, knowing that a variable is on the stack might not be sufficient. We might want to know the exact scope information, see the difference between p and q:

{ int p; }
int q;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to have Unknown kinds?

Yes. I have a TODO to represent Opaque loans in LoanInfo where the AccessPath is not known.

But I am wondering what is this information used for?

The Path is primarily used to generate 2 facts: IssueLoan and Expireloan. When we see a destructor/lifetime end, we find the loans with paths which just got destroyed. These loans should not expire. Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.

Also, sometimes we might learn more about a location during analysis. E.g., in a branch if (p == q), the two pointers must have the same Kind. Or if we see delete p; we learned that p must be a heap location.

I do not have plans to modify the loans and facts based on the control flow or the path. I would prefer to not consider path-sensitive information to keep the analysis simple. Once the facts are generated, the dataflow would not generate new facts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.

I am not yet convinced that we will actually need this, but this may be because we only have one kind implemented so far:

      if (LoanPath.PathKind == AccessPath::Kind::StackVariable) {
         if (LoanPath.D == DestructedVD) {

Here, I'd imagine when we issue the ExpireLoan, we know exactly what kind of entity is expiring. The AST/CFG should tell us if it was a stack variable, a heap location, or a temporary. After that, I'd expect that the AccessPath should have all the information we need to check what Loans should expire here (e.g., the fields of an expired loan should also expire).

};
OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
: ID(id), Kind(kind), Decl(D) {}
OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a Variable OriginKind with an Expr? If not, I think the ctor should set the right OriginKind and not leave this to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


private:
LoanID NextLoanIDVal = 0;
llvm::SmallVector<LoanInfo> AllLoans;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important to address now, but I wonder whether small buffer optimisations are beneficial here. I can imagine that most functions might end up generating enough loans for us to heap allocate. But we can measure this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Added a TODO to reconsider this during performance evaluations.


private:
OriginID NextOriginIDVal = 0;
llvm::SmallVector<OriginInfo> AllOrigins;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concerns about the usefulness of small buffer optimisation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a todo to profile and reconsider later.


OriginID getNextOriginID() { return NextOriginIDVal++; }
OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
assert(D != nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we take a reference instead if it cannot be null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Dataflow.run();
DEBUG_WITH_TYPE("LifetimeDataflow", Dataflow.dump());
}
} // namespace clang
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing new line at the end of the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't have any major comments on the design, I have mostly some nits inline.

namespace clang {
namespace {

struct Point {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if Point is self-descriptive enough or whether this should be ProgramPoint. Does not have a strong feeling.

/// Index into Block->Elements().
unsigned ElementIndex;

Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need this ctor? Alternatively, we could just have some default member initializers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added default member init and removed ctor.

Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
: Block(B), ElementIndex(Idx) {}

bool operator==(const Point &Other) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Point is fairly small, we could take it by value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(obsolete)


class FactManager {
public:
llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could facts change after they are added? Should we always have const Fact *?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. No they cannot change. Done.

return;
// An ImplicitCastExpr node itself gets an origin, which flows from the
// origin of its sub-expression (after stripping its own parens/casts).
if (ICE->getCastKind() == CK_LValueToRValue) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test so I can see how this works?

int a;
int *p = &a;
int **pp = &p;
int *q = *pp; 

bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }

void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
/// TODO: Also handle trivial destructors (e.g., for `int`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at clang::CFG::BuildOptions::AddLifetime? That should generate CFGElement::LifetimeEnds for primitive types too.

struct LifetimeLattice {
/// The map from an origin to the set of loans it contains.
/// TODO(opt): To reduce the lattice size, propagate origins of declarations,
/// not expressions, because expressions are not visible across blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because expressions are not visible across blocks

This is not entirely true, e.g., f(cond ? *p : *q); Here, *p and *q are not declarations but they are visible across blocks. That being said, there are a limited number of ways this can happen, so we still can make the optimization suggested here. It is just a bit more elaborate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.


OriginLoanMap MergedState = Origins;
// For each origin in the other map, union its loan set with ours.
for (const auto &Entry : Other.Origins) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if the merging could be more efficient if it was implemented in ImmutableMap proper. Not for this PR but a potential optimization in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. I highly suspect the current merge performance will become a bottleneck for meeting default enablement standards. Merging is by far our most expensive operation.

A merge inside ImmutableMap would be the right place for this optimization. However, it can't be implemented efficiently with the current llvm::ImmutableMap, which is backed by an AVL tree. The issue is that an AVL tree's structure is heavily dependent on the history of insertions and deletions due to self-balancing rotations. Two maps with many shared keys can have completely different tree shapes, making it impossible to identify and reuse identical subtrees.

For an efficient merge, a persistent HAMT (Hash Array Mapped Trie) would be the ideal backing data structure. An HAMT's structure is determined by its keys' hashes, not insertion order. This property allows a merge algorithm to find identical subtrees and share them, performing work proportional only to the O(difference) between the two maps. In our use case, this difference is expected to be small: roughly the size of the branch being merged (as compared to the size of the function).

I will mention this and leave it for a future PR.

@ymand was this the data structure you were referring to last time we talked ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what is the relationship between HAMT and Patricia Trees, but the latter one is frequently used in dataflow analysis for its great merge performance.

There is one implementation here with a reference to the original paper: https://github.com/facebook/SPARTA/blob/42d4ccd2e5e0ea31ee7969578a685e1d0407c206/include/sparta/PatriciaTreeCore.h#L35

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Both of them are essentially tries, Patricia is a compressed binary trie while HAMT has a wide branching factor so should be subtly comparable. I can imagine both of them to support this set-difference-based merging algorithm. Will keep in mind for later. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of Union-Find, but sounds like Patricia Tries are the way to go (I remember really liking this data structure when I first learned about it my first year in college! :))

Transferer Xfer;

/// Stores the merged analysis state at the entry of each CFG block.
llvm::DenseMap<const CFGBlock *, LifetimeLattice> BlockEntryStates;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we save lookups if we had something akin to llvm::DenseMap<const CFGBlock *, std::pair<LifetimeLattice, LifetimeLattice>>?

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?

@usx95
Copy link
Contributor Author

usx95 commented Jun 10, 2025

I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?

Good point. This is actually going to be a part of the error reporting. And yes, polonius does have a notion of live origins/loans.
I can think of primarily two different strategies for generating diagnostics.

  1. (preferred) Decide at the point where a loan expires: What are the origins that contain this expired loan and is still used later at some point. This would need a liveness analysis, i.e., it is an error to have a live origin which holds an expired loan.
  2. Decide at the point where an origin is used: It is an error, if there is a use of an origin which holds an expired loan. The current dataflow results are enough to power this but we would need to remember which expired loans are already reported so that we do not error on each use of an origin holding an expired loan.

I prefer approach 1 as this more directly works on the invalid memory as compared to origins carrying invalid loans to a potential use.
I have left this out from the current implementation but let me know if you would find it helpful to implement a warning (maybe return-stack-addr) in this patch.

@Xazax-hun
Copy link
Collaborator

implement a warning (maybe return-stack-addr) in this patch.

This patch is already big enough. But I think a small comment of how reporting will work down the line might be useful.

Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks very good. I would recommend you consider splitting this into multiple smaller PRs, so reviews can focus on a specific aspect. E.g. the FactGenerator and the LifetimeDataflow seems to deserve focused reviews.

/// Represents the storage location being borrowed, e.g., a specific stack
/// variable.
/// TODO: Handle member accesseslike `s.y`.
struct Path {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like Gabor's point above, maybe a more specific name, like AccessPath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 53 to 54
using LoanID = uint32_t;
using OriginID = uint32_t;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From experience, it's better to use a proper data type for IDs. Using integers directly almost always causes pain later (say if you need to change the underlying type, or if you want to prevent arithmetic, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

: ID(id), SourcePath(path), IssueLoc(loc) {}
};

enum class OriginKind : uint8_t { Variable, ExpressionResult };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a variable (reference) is an expression, this is bit surprising. Is the Variable case referring to the variable declaration? If so, maybe make that expression (like Declaration vs Expression instead)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah yes. This might be confusing. Yes we are referring to the Declaration here.

Comment on lines 77 to 81
OriginKind Kind;
union {
const clang::ValueDecl *Decl;
const clang::Expr *Expression;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using a std::variant instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah thanks. I think I missed this one. Switched to the llvm::PointerUnion

return AllLoans.back();
}

const LoanInfo *getLoanInfo(LoanID id) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation in which you expect it valid to provide an id that's not in AllLoans? If not, consider returning a reference instead and CHECK failing on bad ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (hasOrigin(VD->getType())) {
if (const Expr *InitExpr = VD->getInit()) {
OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that InitExpr has already been visited at this point? Put another way, why the flexible getOrCreate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Good point. I think check-failing on non-existing source originID would help us catch the parts of unhandled AST. Added.

struct LifetimeLattice {
/// The map from an origin to the set of loans it contains.
/// TODO(opt): To reduce the lattice size, propagate origins of declarations,
/// not expressions, because expressions are not visible across blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.


/// Computes the union of two lattices by performing a key-wise merge of
/// their OriginLoanMaps.
LifetimeLattice merge(const LifetimeLattice &Other,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is a dataflow computation, I would stick w/ standard terminology (assuming it fits): join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LifetimeLattice NewSuccEntryState =
OldSuccEntryState.merge(ExitState, LifetimeFact);
if (SuccIt == BlockEntryStates.end() ||
NewSuccEntryState != OldSuccEntryState) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance optimization (potentially for a separate PR): If this equality operation isn't cheap, then consider changing the merge operation to indicate whether it produced any changes. This is often cheaper to compute as part of merging (if a little more of a nuisance).

/// Orchestrates the analysis by iterating over the CFG using a worklist
/// algorithm. It computes a fixed point by propagating the LifetimeLattice
/// state through each block until the state no longer changes.
/// TODO: Maybe use the dataflow framework!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the current form, where you use CFGBlocks to drive the computation, using the framework would make sense. But, if you're attached to the current algorithm (comparing at the before state rather than the after state), you'll need to implement it as an alternative (or modify the current one), as we discussed in our last meeting.


LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
NextLoanIDVal++;
AllLoans.emplace_back(NextLoanIDVal, path, loc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that this starts at LoanId 1 instead of 0 (because the NextLoanIDVal++ happens first)?

Does that affect the getLoanInfo() indexing?

// Origin of DeclRefExpr is that of the declaration it refers to.
return getOrCreate(DRE->getDecl());
}
OriginID NewID = getNextOriginID();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question -- would this start at 1 instead of 0, and does that affect getOriginInfo(id) indexing?

Copy link
Contributor Author

@usx95 usx95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed few comments.

/// Index into Block->Elements().
unsigned ElementIndex;

Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added default member init and removed ctor.

namespace clang {
namespace {

struct Point {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed Point at the moment. I think this can be added later when this actually looks useful. At the moment it is primarily mapping Facts to CFGStmt which is not used.

};
OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
: ID(id), Kind(kind), Decl(D) {}
OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2842,6 +2844,12 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
}
}

DEBUG_WITH_TYPE(
"ExperimentalLifetimeAnalysis", if (S.getLangOpts().CPlusPlus) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary to initially scope it to C++. Added a TODO to enable it later.

Dataflow.run();
DEBUG_WITH_TYPE("LifetimeDataflow", Dataflow.dump());
}
} // namespace clang
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 276 to 278
void addFact(const CFGBlock *B, Fact *NewFact) {
BlockToFactsMap[B].push_back(NewFact);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.


struct FactsContext {
FactManager Facts;
LoanManager Loans;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Moved LoanManager and OriginManager into FactManager.

Comment on lines 333 to 334
// TODO: A pre-order traversal utility should be provided by Dataflow
// framework.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I fell for "Reverse postordering is not the same as preordering". I meant reverse post-order which gives a topological order.
What I expect is for the dataflow framework or the CFG to provide this iteration order utility. Right now I have to use a worklist to get this effect which involves queuing all the nodes followed by deqeueing all nodes.

if (hasOrigin(VD->getType())) {
if (const Expr *InitExpr = VD->getInit()) {
OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Good point. I think check-failing on non-existing source originID would help us catch the parts of unhandled AST. Added.

struct LoanTag {};
struct OriginTag {};

using LoanID = ID<LoanTag>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could even do using LoanID = ID<struct LoanTag>;, so no separate definition is required for these tags.

public:
OriginManager() = default;

OriginID getNextOriginID() { return ++NextOriginID; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this and the below APIs to be public? I'd expect users to only rely on the higher level get and getOrCreate methods.

}

void VisitCXXNullPtrLiteralExpr(const CXXNullPtrLiteralExpr *N) {
/// TODO: Handle nullptr expr as a special 'null' loan. Uninintialed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Uninintialed -> Uninitialized

struct Path {
const clang::ValueDecl *D;

enum class Kind : uint8_t {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of the entities like stack variable, temporaries, fields (of stack variables) would need to be handled slightly differently and this is only where this PathKind would be used.

I am not yet convinced that we will actually need this, but this may be because we only have one kind implemented so far:

      if (LoanPath.PathKind == AccessPath::Kind::StackVariable) {
         if (LoanPath.D == DestructedVD) {

Here, I'd imagine when we issue the ExpireLoan, we know exactly what kind of entity is expiring. The AST/CFG should tell us if it was a stack variable, a heap location, or a temporary. After that, I'd expect that the AccessPath should have all the information we need to check what Loans should expire here (e.g., the fields of an expired loan should also expire).

Comment on lines 333 to 334
// TODO: A pre-order traversal utility should be provided by Dataflow
// framework.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do that via: AnalysisDeclContext::getAnalysis<PostOrderCFGView>(). But maybe it would be worth to have a convenience method added to the worklist implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants